A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation

نویسندگان

  • Ramón Fernández Astudillo
  • Reinhold Orglmeister
چکیده

Uncertainty propagation techniques achieve a more robust automatic speech recognition by modeling the information missing after speech enhancement in the short-time Fourier transform (STFT) domain in probabilistic form. This information is then propagated into the feature domain where recognition takes place and combined with observation uncertainty techniques like uncertainty decoding. In this paper we show how uncertainty propagation can also be used to yield minimum mean square error (MMSE) estimates of the clean speech directly in the recognition domain. We develop a MMSE estimator for the Mel-cepstral features by propagation of the Wiener filter posterior distribution and show how it outperforms conventional MMSE methods in the STFT domain on the AURORA4 large vocabulary test environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition

In this paper, we investigate the use of the minimum mean square error (MMSE) spectral energy estimator for use in environmentrobust automatic speech recognition (ASR). In the past, it has been common to use the MMSE log-spectral amplitude estimator for this task. However, this estimator was originally derived under subjective human listening criteria. Therefore its complex suppression rule may...

متن کامل

Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

In this paper, we present robust feature extractors that incorporate a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, to estimate the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high var...

متن کامل

MMSE estimation of log-filterbank energies for robust speech recognition

In this paper, we derive a minimum mean square error log-filterbank energy estimator for environment-robust automatic speech recognition. While several such estimators exist within the literature, most involve trade-offs between simplifications of the log-filterbank noise distortion model and analytical tractability. To avoid this limitation, we extend a well known spectral domain noise distort...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Combating reverberation in large vocabulary continuous speech recognition

Reverberation leads to high word error rates (WERs) for automatic speech recognition (ASR) systems. This work presents robust acoustic features motivated by subspace modeling and human speech perception for use in large vocabulary continuous speech recognition (LVCSR). We explore different acoustic modeling strategies and language modeling techniques, and demonstrate that robust features with a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010